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THE WAVEGUIDE EIGENVALUE PROBLEM AND THE TENSOR 
INFINITE ARNOLDI METHOD 

ELIAS JARLEBRING, GIAMPAOLO MELE, OLOF RUNBORG* 

Abstract. We present a new computational approach for a class of large-scale nonlinear eigen¬ 
value problems (NEPs) that are nonlinear in the eigenvalue. The contribution of this paper is 
two-fold. We derive a new iterative algorithm for NEPs, the tensor infinite Arnoldi method (TIAR), 
which is applicable to a general class of NEPs, and we show how to specialize the algorithm to a 
specific NEP: the waveguide eigenvalue problem. The waveguide eigenvalue problem arises from a 
finite-element discretization of a partial differential equation (PDE) used in the study waves prop¬ 
agating in a periodic medium. The algorithm is successfully applied to accurately solve benchmark 
problems as well as complicated waveguides. We study the complexity of the specialized algorithm 
with respect to the number of iterations m and the size of the problem n, both from a theoretical 
perspective and in practice. For the waveguide eigenvalue problem, we establish that the computa¬ 
tionally dominating part of the algorithm has complexity 0{nm 2 + y/nm 3 ). Hence, the asymptotic 
complexity of TIAR applied to the waveguide eigenvalue problem, for n —> oo, is the same as for 
Arnoldi’s method for standard eigenvalue problems. 


1. Introduction. Consider the propagation of waves in a periodic medium, 
which are governed by the Helmholtz equation 


(i.i) 

where rj £ 1° 


Av(x, z) + oo 2 ri(x, z) 2 v(x, z) = 0, (x, z) £ R 2 , 

is called the index of refraction and ui the temporal frequency. 
When (1.1) models an electromagnetic wave, the solution v typically represents the 


y-component of the electric or the magnetic field. The (spatially dependent) wavenum¬ 
ber is k(x, z) := u>r](x, z) and we assume that the material is periodic in the ^-direction 
and without loss of generality the period is assumed to be 1, i.e., p(x , z + 1) = r](x, z). 
The index of refraction is assumed to be constant for sufficiently large \x\, such that 
k(x , z) = K— when x < X-, n(x, z) = when x > x+. In this paper we assume the 
wavenumber to be piecewise constant. Figure 0 shows an example of the setup. 

Bloch solutions to (1.1) are those solutions that can be factorized as a product of 
a ^-periodic function and e 72 , i.e., 


( 1 . 2 ) 


v(x, z) = v{x, z)e 7 


)( x, z + 1) = v(x, z). 


The constant 7 £ C is called the Floquet multiplier and without loss of generality, 
it is assumed that Imy £ (—27T, 0]. We interpret (1.1) in a weak sense. We are only 


interested in Bloch solutions that decay in magnitude as |a;| —> 00 and we require that 
v, restricted to S := R x (0,1), belongs to the Sobolev space H 1 (S). Moreover, we 
assume that any Bloch solution has a representative in C ,1 (S'). These solutions are in 
general not in C 2 (S ) since k is discontinuous. 

In this contex, Bloch solutions are also called guided modes of 0. If 7 is purely 
imaginary, the mode is called propagating ; if | R,e 7I is small it is called leaky. Both 
mode types are of great interest in various settings USES EDI IE1 El- We present a 
procedure to compute leaky modes, with Re 7 < 0 and Im.7 £ (—27r, 0). This specific 
setup has been studied, e.g., in HIT] . 

To compute the guided modes one can either fix to and find 7, or, conversely, 
fix 7 and find u>. Both formulations lead to a PDE eigenvalue problem set on the 


*Dept. Mathematics, KTH Royal Institute of Technology, SeRC Swedish e-science research center, 
Lindstedtsvagen 25, Stockholm, Sweden, email: {eliasj,gmele,olofr}@kth.se 


1 







Figure 1 . 1 : Illustration of a waveguide defined on R 2 . The wavenumber k(x,z) is 
constant in the three regions separated by the thick lines. 


unbounded domain S. When 7 is held fix, the eigenvalue problem is linear and if u> is 
held fix, it is nonlinear (quadratic) in 7. In this paper we fix w, and the substitution 
of (1.2) into 0 leads to the following problem. Find (7, t>) G C x H 1 (S) such that 


( 1 . 3 a) Av(x, z) + 2 'yv z (x, z) + (7 s + k(x, z) 2 )v(x, z) = 0 , ( x,z)£S , 

( 1 . 3 b) v(x, 0 ) = v(x, 1 ), ieK, 

( 1 . 3 c) v z (x, 0 ) = v z (x, 1 ), x G R. 


The problem 0, which in this paper is referred to as the waveguide eigenvalue 
problem, is defined on an unbounded domain. We use a well-known technique to 
reduce the problem on a unbounded domain to a problem on a bounded domain. We 
impose artificial (absorbing) boundary conditions, in particular so-called Dirichlet-to- 
Neumann (DtN) maps. See [T6J [B] for literature on artificial boundary conditions. 

The DtN-reformulation and a finite-element discretization, with rectangular ele¬ 
ments generated by a uniform grid with n x and n z grid points in x and z-direction 
correspondingly, is presented in section [ 2 j A similar DtN-discretization has been ap¬ 
plied to the waveguide eigenvalue problem in the literature m- In relation to j 3 lj . 
we need further equivalence results for the DtN-operator and use a different type of 
discretization, which allows easier integration with our new iterative method. Due 
to the fact that the DtN-maps depend on 7, the discretization leads to a nonlinear 
eigenvalue problem (NEP) of the following type. Find (7, w) € C x C n \{ 0 } such that 


( 1 . 4 ) 


M(j)w = 0 , 

where 



( 1 . 5 ) 

M{ 7) := 

Q( 7) C,( 7) 

. C 2 t P( 7) 

and n = n x n z + 2 n z . 

The matrices <5(7) G C 


'"'■'•‘X".'*. and Ci(7) g C" inzX 2 % 

are a quadratic polynomials respect 7, <5(7) = A 0 + A17 + A27 2 , Ci(7) = Cqo + 
Ci,i7 + Ci, 27 2 , where Ai,C\^ and Cj are large and sparse. The matrix P(7) has the 
structure 


( 1 . 6 ) 


P( 7 ) = 


RA_( r y)R ~ 1 

0 


0 

PA + (7)P 


-1 


eC 


2 n z x2 n z 


and A ± (7) e C HiXnz are diagonal matrices containing nonlinear functions of 7 in¬ 
volving square roots of polynomials. The matrix-vector product corresponding to R 
and R can be computed with the Fast Fourier Transform (FFT). 
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We have two main contributions in this paper: 

• a new algorithm, tensor infinite Arnoldi method (TIAR), for a general class 
of NEPs (1.41, which is based on a tensor representation of the basis of the 


infinite Arnoldi method (IAR) [2D] ; 

an adaption of TIAR to the structure arising from a particular type of dis¬ 
cretization of the waveguide eigenvalue problem. 


The general NEP (1.41 has received considerable attention in the literature in 


various generality settings. We list those algorithm that are related to our method. 
See the review papers [25] [32] and the problem collection [7j, for further literature. 

Our algorithm TIAR is based on ED], but uses a compact representation of Krylov 
subspace generated by a particular structure in the basis matrix. Different compact 
representations for iterative methods for polynomial eigenvalue problems and other 
NEPs have been developed in other works. In particular, the basis matrices stemming 
from Arnoldi’s method applied to a companion linearizations of polynomial eigenvalue 
problems can be exploited by using reasoning with the Arnoldi factorization. This 
has been done for Arnoldi methods in [23] [34, I| and rational Krylov methods [5]. 
The approach |23] is designed for polynomial eigenvalue problems expressed in a 
Chebyshev basis, makes it particularly suitable to use in a two stage-approach, which 
is done in m, where the eigenvalues of interest lie in a predefined interval and (a 
non-polynomial) M can first be approximated with interpolation on a Chebyshev 
grid and subsequently the polynomial eigenvalue problem can be solved with [23] . 
The algorithm in [34] is mainly developed for moment-matching in model reduction 
of time-delay systems, where the main goal is to compute a subspace (of C") with 
appropriate approximation properties. We stress that the algorithm in the preprint 
[5], which has been developed in parallell independent of our work, is similar to 
TIAR in the sense that it can be interpreted as a rational Krylov method for general 
NEPs involving a compact representation. The derivation is very different (involving 
reasoning with a compact Krylov factorization as in [34] 23]), and leads to a general 
class of methods with different algorithmic properties. 

Some recent approaches for (1.4) exploit low-rank properties, e.g., AfW(O) = 
ViQ T , where Vi,Q £ C nxr for sufficiently large i, and r is small relative to n. See, 


e.g., [29] 33j '3|. This property is present here if we select r = n z = 0(y/n), which is 
not very small with respect to the size of the problem, making the low-rank methods 
to not appear favorable for this NEP. 

The (non-polynomial) nonlinearities in our approach stem from absorbing bound¬ 
ary conditions. Other absorbing boundary conditions also lead to NEPs. This has 
been illustrated in specific applications, e.g., in the simulation of optical fibers | 21 j . 
cavity in accelerator design [24], double-periodic photonic crystals mm and micro¬ 
electromechanical systems [3] . There is to our knowledge no approach that integrates 
the structure of the discretization of the PDE and the 7 -dependent boundary condi¬ 
tions with an Arnoldi method. The adaption of the algorithm to our specific PDE is 
presented in section [4] 

The notation is mostly standard. A matrix consisting of elements is denoted 


MS=i = 


oi,i 


Am, 1 


fll,, 


The notation is analogous for vectors and tensors. We use Q to denote an extension 
of Q with one block row of zeros. The size of the block will be clear by the context. 
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2. Derivation of the NEP. 


2.1. DtN reformulation. As a first step in deriving a computational approach 
to (1.31, we rephrase the problem on a bounded domain So := [x_,x + \ x [0,1] C S by 
introducing artificial boundary conditions at x = x ±. We use a construction with so- 
called Dirichlet-to-Neumann (DtN) maps which relate the (normal) derivative of the 
solution at the boundary with the function value at the boundary. The main concepts 
of DtN maps are presented in various generality settings in, e.g., nn [ 22 i nzi mi ng. 
We use the same DtN maps as in [5T i, but we use a different discretization and we 
need to derive some results necessary for our setting. 


The DtN formulation of the eigenvalue problem (1.3) is given as follows. Find 7 
and u G If 1 (So) such that 


(2.1a) Au(x, z) + 2y u z (x, z) + (y 2 + k(x, z) 2 )u{x, z) = 0, 


(x,z) G S 0 , 


( 2 . 1 b) 

(2.1c) 

( 2 -ld) 

( 2 -le) 


u(x, 0 ) = u(x, 1 ), 
u z (x, 0 ) = u z (x, 1 ), 
T- tl [u(x-, •)] = ~u x (x-, ■), 
T +;1 [u{x+, •)] = U x (x+, •), 


X G (x-, x + ), 
X G (x-, x + ), 


where T± n : ii^QO, 1 ]) L 2 ([0,1]) are the DtN maps, defined by 


( 2 . 2 ) 


T±^\g\{z) := Y s±, k {l)gke 


2ivikz 


where [gk]ke z is the Fourier expansion of g, i.e., g(z) := Y^k£i.9k e2 ™ kz and 

(2.3) / 3 ±,fc( 7 ) := (7 + 2 i 7 r/c ) 2 + n 2 ± = ((7 + 2ink) + m±)(( 7 + 2iTrk) — in±), 

(2.4) s±, fc ( 7 ) := sign(Im {P±, k {l)))i\jP±,k{l)- 


In this section we show that, under the assumption that neither the real nor the 
imaginary part of 7 vanish, the DtN maps are well-defined and the problems ( |1.3| ) and 
(2.1) are equivalent. In order to characterize the DtN maps, we consider the exterior 
problems, i.e., the problems corresponding to the domains S+ = (a;+,oo) x (0,1) and 
S_ = (—oo,a;_) x (0,1). The exterior problems are defined as the two problems 
corresponding to finding w G H 1 (S±) such that, for a given g, 


(2.5a) Aw + 2ryw z + ("/ 2 + k±)w = 0, (i,z)gS±, 

(2.5b) w(x, 0) = w(x, 1 ), 

(2.5c) w z (x, 0) = w z (x, 1), 

(2.5d) w{x±,z) = g{z). 


Remark 2.1 (Regularity). Note that if we multiply a solution to {H, & or 
(2.5) with e lz , we have a solution to the Helmholtz equation, i.e., it satisfies (1.1) 
in their respective domains, i.e. S, So and S±. By assumption, solutions to (1.3), 
(2.1) and (2.5) are C 1 and the traces taken on x = x± and its first derivatives are 
always well-defined and continuous. Moreover, for x < x_ and x > x + , since n{x, z) 
is constant, the problem can be interpreted in a strong sense and the solutions are in 
C°°. 
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Our assumption that the solution has regularity C 1 can be relaxed as follows. If we 
select x _ and x + such that n is constant over x_ and x + , we have by elliptic regularity 


m Section 6.3.1, Theorem 1], that weak H 1 solutions of ( |1.3[ ) ( |2.1[ ) and ( |2.5[ ) are 
in Hf oc . This means that traces taken on x = x± of a solution and its derivatives are 
always well-defined and smooth, without explicitly assume that the solution is in C 1 . 


The following result illustrates that the application of the DtN maps in (2.2) is 


in a sense equivalent to solving the exterior problems and evaluating the solutions in 
the normal direction at the boundary x = x±. More precisely, the following lemma 
shows that if Re 7 7 ^ 0 and Im 7 £ (—27r, 0) the problems are well-posed in H 1 (S±) 
and the boundary relations ( 2. ld[ ) and (2.1e) are satisfied. The proof is available in 
Appendix [A] 

Lemma 2.2 (Characterization of DtN maps). Suppose Rey 7 ^ 0, and Imy ^ 27 tZ 
and g £ iL 1 // 2 ([0,1]). Then, each of the exterior problems (2.5) have a unique solution 
w £ H 1 (S±). Moreover, there is a constant C independent of g such that 


( 2 . 6 ) 


MI-EfRSi) < C|l5llff 1 /2([0 1 1])- 


If it is further assumed that g £ iL 1 ([ 0,1]), then the DtN maps in (2.2) are well-defined 
and satisfy 


(2.7) 


T +lJ [w(x + , -)](z) = w x (x+,z), 71, 7 [w(x_,-)](z) = -w x (x-,z). 


This lemma immediately implies the equivalence between (1.3) and (2.1) under 
the same conditions on 7 . 


Theorem 2.3 (Equivalence of (1.3) and (2.1)). Suppose Rey 7 ^ 0 and Imy ^ 
2 - 7 tZ. Then u £ H 1 (Sq) is a solution to (2.1) if and only if there exists a solution 


v £ H 1 (S) to (1.3) such that u is the restriction of v to So- 

Proof. Suppose v is a solution of (1.3) and u is its restriction to Sq. Then u 


clearly satisfies (2.1 i-c). By Remark 2.1 the functions v(x±,z) = u(x±,z) are in 
C 1 ([0,1]) C iL 1 ([0,1]). Lemma 2.2 shows that v, restricted to S±, are the unique 


solutions to the exterior problems (2.5). Hence, v is identical to the union of u and 


the solutions to the exterior problems (2.5). Since v £ C 1 (S), we have that v x (x±,z) 
is continuous and w x (x±,z) = u x (x±,z) = v x (x±,z). Moreover, due to ( |2.7| ), the 
boundary conditions @1 -e) are satisfied. 


2.1 


On the other hand, suppose u £ H 1 (Sq) is a weak solution to ( 2 . 1 ). Remark 
again implies that u £ C 1 (5'o) and in particular u(x±,z) £ C 1 ([0,1]) C i7 1 ([0,1]). 


We have from Lemma 2.2 that the exterior problems (2.5) have unique solutions w 


that satisfy (2.7). Let v be defined as the union of the u and w. The union v has 


a continuous derivative on the boundary x = x± due to (2.1 )d-e and (2.7) and since 


u £ iL 1 (S'o) an d w £ H 1 (S±), then v £ H 1 (S) and satisfies (1.3) by construction. □ 


Remark 2.4 (Conditions on y). Modes with Rey = 0 are propagating. For those 
modes, the well-posedness of the DtN-maps depends on the wave number. See M 
for precise results about well-posedness in the situation Rey = 0. In our setting we 
only consider leaky modes and Rey < 0. The situation Rey > 0 can be treated 
analogously. 


2.2. Discretization. We discretize the finite-domain PDE (2.1) with a finite- 


element approach. The domain [x_,x+] x [0,1] is partitioned using rectangular ele¬ 
ments obtained with a uniform distribution of nodes in the x and z directions. We 
use n x grid points in the x-direction and n z grid points in the z-direction and de¬ 
fine Xi = x- + ih x and Zj = jh z where i = 1 ,... ,n x , j = 1 ,... ,n z , h x = 


— E+l 


n x +1 
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and h z = —. The basis functions are chosen as piecewise bilinear functions that are 
periodic in the z direction with period 1. In particular, the basis functions that we 
consider are periodic modification of the standard basis functions. We denote them 
as 1 ,{<j>±,j(x,z)} n * where <j> it j(x t ,z a ) = S itS S jtt , 4>-j(x-,z s ) = 5 jiS 


and (j>+j(~x+, z s ) = Sj yS . The finite-domain PDE (2.11 can be rewritten in weak form 
(2.8) a(u, <j>) + jb(u, <j>) + 7 2 c(u, <j>) = 0 \/4>eH 1 (S 0 ) 


where a,b and c are bilinear operators. The approximation of a solution of (2.8) can 
be expressed as 

n x n z n z n z 

(2.9) u(x,z) = 'Y^2,u i ^ j (x,z) +Y^u- tj ^- ij (x,z) + '^2u+j(t) +ij (x,z). 
i— 1 j—1 j=l j =1 

We represent the coefficients that defines u(x, z) in a compact way 



^ 1,1 • 

u n x , 1 


U-, 1 


u+ t 1 

u = vec 

^l,n z 

^ j n x ,n z _ 

, u- = 


, 11+ = 



such that the Ritz-Galerkin discretization of ( |2.8| ) leads to the following relation 
( 2 . 10 ) 
where 


Q(7)u + Ci(7) 


U- 

u+ 


= 0 , 


Q(t) : = A o + l A i + 7 2 ^2, 

Cr(7) :=C' 1 ,o+7Ci,i+7 2 Ci,2. 

The matrices (Aj)?_ 0 and (C'i ) j)|_ 0 can be computed in an efficient and explicit waj0 
with the procedure outlined in Appendix |B| 

Two approximations must be done in order to incorporate the boundary con¬ 


ditions. We construct approximations of the right-hand side of (2.1 )d-e using the 
one-sided second-order finite-difference approximation, 


—— d 0 u+ 


and Cl + = ((0 ,..., 0, d 2l di)®I n J £ 
with do = — , d\ = ^- and = -jF- The DtN maps in the left-hand 



1 

r 

a 

1 

1 _ 


U x {x+,Zi) 

(2.11) 


~ —— doU- and 



-U x (x-,z n j_ 


_u x {x+,z n J_ 


where Cj_ = (di, d 2 ,0,..., 0)®I„ z £ C 
= — 2ST> = F 


side of (2.1 i-e) act on the function values on the boundary only, i.e., the function 
approximated by u± . We compute the first p Fourier coefficients of the approximated 
function, apply the definition of 7± )7 on the Fourier coefficients, and convert the 
Fourier expansion back to the uniform grid. More precisely, the approximation of the 


left-hand side of (2.1 )d-e is given by 
(2.12) \t±^(u{x±,z))\ z=z 


i=1 


RL±( , y)R 1 u± £ 


lr The matrices are available online in order to make the results easily reproducible: http:// 
people.kth.se/~gmele/waveguide/ 
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where L±('y) = diag([sy±(7)]^ = _ p ), and R = [exp(2*7r jzk)]k=i,j=- p with n z = 2p+l. 
In the algorithm we exploit that the action of R and I? -1 can be computed with 


FFT. We match (2.12) and (2.11) and get a discretization of the boundary condition 


( |2.1 )d-e. That is, we reach the NEP (1.4), with M given by (1.5) if we define C 2 := 
Ci- C2.+I, A±(y) := L±+doI and w 1 := \u T u£ u+] and combine (2.10) with 


(2.12) and (2.11). 


3. Derivation and adaption of TIAR. 

3.1. Basis matrix structure of the infinite Arnoldi method (IAR). There 
exists several variations of IAR, mm- We use the variant of IAR in [2D] called 
the Taylor variant, as it is based on the Taylor coefficients (derivatives) of M. We 
briefly summarize the algorithm and characterize a structure in the basis matrix. 
Similar to the standard Arnoldi method, IAR is an algorithm with an algorithmic 
state consisting of a basis matrix Qk and a Hessenberg matrix H The basis matrix 
and the Hessenberg matrix are expanded in every loop. Unlike the standard Arnoldi 
method, in IAR, the basis matrix is expanded by a block row as well as a column, 
leading to a basis matrix with block triangular structure, where the leading (top left) 
submatrix of the basis matrix is the basis matrix of the previous loop. More precisely, 
there exist vectors qij £ C n , i,j = l,...,k such that 


(3.1) 


Qk = 


91,1 91,2 

0 92,2 


0 


0 


9i .k 


Qk,k 


In every loop in IAR we must compute a new vector to be used in the expansion 
of Qk and Hk . In practice, in iteration k, this reduces to computing y\ £ C n given 
2/2, - - • ,2/fc+i such that 


( k 

£m«(0) W+1 

*=1 

Clearly, since M(0) does not change throughout the iteration, and we can compute 
an LU-factorization before starting the algorithm, such that the linear system can be 
solved efficiently in every iteration. IAR (Taylor version) is for completeness given by 
algorithm [T] 

Steps 3-9 of Algorithm [l] are visualized in Figure [3~T| when k = 4, i.e., after three 
iterations. We have marked those operations that are linear combinations as dashed 
lines. The fact that the many operations are linear combinations leads to a structure 
in Qk which can be exploited such that we can reduce the usage of computer resources 
(memory and computation time) and maintain an equivalence with Algorithm [lj 
More precisely, the block elements of the basis matrix Qk have the following 
structure. 

Lemma 3.1 (Structure of basis matrix). Let Qi, i = 1 ,...,fc be the sequence 
of basis matrices generated during the execution of k — 1 iterations of Algorithm [Ij 
Then, all block elements of the basis matrix Qk (when partitioned as ( |3.1 | ) ) are linear 
combinations of 914 ,..., qi,k- 

Proof. The proof is based on induction over the iteration count k. The result is 
trivial for k = 1. Suppose the results holds for some k. Due to the fact that Qk is the 
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Algorithm 1: Infinite Arnoldi method - IAR (Taylor version) [2fJ] 
input : X\ £ C n 


1 Let Q i = xi/||a;i||2, Hq =empty matrix 
for k = 1, 2,..., m do 

2 Compute y 2 , - • •, y k +x from the last column of Qk by setting 


1 

Vj ~ ■ _ 1 i,fc 


3 

4 

5 

6 

7 

8 
9 

10 

11 


for j = 2,..., k + 1. 

Compute y\ from y 2 , ■. ■, j/fc+i using (f372l) 


Let y := vec(yi,..., j/k+i) and Q k := 

Compute h = Q^y 
Compute y± = y — Q.h 


Qk 

0 


g (£>(ra(fc+l))xfc 


Possibly repeat Steps 5|6 and get new /i and y± 
Compute P=\\y ±\\ 2 
Compute q k+ i =y±/P 

H k —i h 


Let H k = 


g {^(fc+!) x fe 


0 /3 

Expand into Q fc+ i = [Q ,q k+ 1] 


end 

12 Compute the eigenvalues {/i;}Kli of the leading m x m submatrix of the 
Hessenberg matrix H k 

13 return approximations {l//ij}£i 


leading submatrix of Qk+i, as in we only need to show that the blocks of the 

new column are a linear combinations q-i. j. i,j = 1,..., k. This follows directly from 
the fact that q 2 ,k+ ij ■ ■ ■ ,Qk+i,k+i is (in step 3-9 in Algorithm [l]) constructed as linear 
combination of qi t k, ■ ■ ■, qk,k- See Figure 3.1 □ 

We note that the structure presented in Lemma |3.1| is very natural in view of sim¬ 
ilar structures in other settings [23j Section 3.1], [34} Page 1057] and [5] Theorem 4.4]. 


3.2. Derivation of TIAR. We now know from Lemma IXTI that the basis matrix 
in IAR has a redundant structure. In this section we show that this structure can 
be exploited such that Algorithm [T] can be equivalently reformulated as an iteration 
involving a tensor factorization of the basis matrix without redundancy. We present a 
different formulation involving a factorization with a tensor which allows us to improve 
IAR both in terms of memory and computation time. This equivalent, but improved, 
version of Algorithm [Tj appears to be competitive in general, and can be considerably 
specialized to the waveguide eigenvalue problem as we show in section [4j 

More precisely, Lemma |3.1| implies that there exists (iij.e for i,j,£=l,...,k such 

that 


k 

(3.3) q it j = ^2 a i , for i, j = 1, • • •, k 

i=i 


where z \,..., z k is a basis of the span of the k first columns of the first block row, 
















Figure 3.1: The computation tree representing Steps 3-9 for Algorithm [T] when k = 4, 
i.e., after three iterations. Every node is vector of size n. The dashed lines correspond 
to computing linear combinations. Clearly, the only (potentially) new direction of the 
span of all vectors can be represented by y\. 


i.e., span(gi j i,..., qi,k) = span(zi,...,z*,). Due to (3.3), the quantities [zi,...,Zfc], 
a,i,j,e for i,j,£ = 1 can be interpreted as a factorization of Qk■ For reasons 

of numerical stability we here work with an orthonormal basis zi,..., z; c , i.e., Zk := 


[zi,...,z fe ] e 


ixk 


is an orthogonal matrix. This is not a restriction if the columns of 


the first block row of Qk are linearly independent. Note that the first block row of Qk 
can only be linearly independent if k < n. This is the case for large-scale nonlinear 
eigenvalue problems, as the one we consider in this paper. 

Suppose for the moment that we have carried out k — 1 iterations of Algorit hm [T| 
From Lemma 3.1 we know that the basis matrix can be factorized according to (3.3). 


The following results show that one loop, i.e., steps 3-11, can be carried out without 


explicitly storing Qk, but instead only storing the factorization (3.3) represented by 
the tensor a,.j t e for i,j,£ = 1,... ,k and the matrix Zf z £ C nxk . Instead of carrying 
out operations on Qk that lead to Qfc-t-i , we construct equivalent operations on the 
factorization of Qk, i.e., for i,j,£ = l,...,fc and the matrix Zf c £ (C raxfe , that 

directly lead to the factorization of Qk+i, he., di,j,e for i,j, £ = 1 ,..., k + 1 and the 
matrix Zk +1 £ C nx ^ fe+1 \ without explicitly forming Qk or Qfc+i- 

To this end, suppose we have for i,j,£ = 1,..., k and z ±,..., Zk available 


after k — 1 iterations such that (3.3) is satisfied, and consider the steps 3-11 one-by- 


one. In Step 3 we need to compute the vectors y^, ■ ■ ■, Vk+i ■ They can be computed 
from the factorization of Qk, since 


(3.4) 


J \ ,h J 


l ^2 a i-bM 
1 e=i 


for j = 2,..., k+1. The vector y\ is (in Step 4) computed using (3.2) and j/ 2 , • ■ •, 2/fc+i 


and does not explicitly require the basis matrix. For reasons of efficiency (which we 
further discuss in Remark 3.3) we carry out (3.4) with an equivalent matrix-matrix 
multiplication, 


[m 


Vk+i] = ZkAk , 


(3.5) 
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where A^ = [a,i,k/]ij=i an d subsequently setting 

(3.6) yj = -—jVj for j = 2,..., k + 1. 

In order to efficiently carry out the Gram-Schmidt orthogonalization process in 
step 6-9, it turns out to be efficient to first form a new vector Zk+i, which can be used 
in the factorized representation of Qk+i- We define a new vector Zk+i via a Gram- 
Schmidt orthogonalization of y i against z\,...,Zk- That is, we compute Zk+i £ C n 
and ii,..., tk+i £ C such that 

(3.7) 2/i = ti^i + • ■ ■ + tk+iZk+i 

and expand Zk+i := [Zk Zk+ i] such that Zj^ +1 Zk +1 = /. 

The new vector y (formed in Step 5) can now be expressed using the factorization, 
since 


(3.8) 


y i 

i „ 


0 

i „ 



j9l ,k 

k 

j a l ,k,t 

k +1 

9i/ 

bz,k 

= ei 8 ) yi + 

\ a 2 ,k,t 

<8>ze = Y 


1 n 

_ Qk,k_ 

i=i 

1 _ 

_k a k,k/_ 

i- i 

_9k+1/_ 


where we have defined gn as 


(3.9a) 

(3.9b) 

(3.9c) 


9i,i := t e for £ = 1,..., k + 1 

9i,i '■= for i = 2,. .., k + 1, £=l,...,k, 

i— 1 

9i,k +i := 0 for i = 2,..., k + 1. 


Instead of explicitly working with y, we store the matrix [gi fiu £ C^ fe+1 ^ x ^ fc+1 \ 
representing the blocks of y as linear combinations of 

In order to derive a procedure to compute h £ C k (in Step 6) without explicitly 
using Qk , it is convenient to express the relation (3.3) using Kronecker products. We 
have 


(3.10) 




Ol.l / 


a i.ki 


Hk,l/ ' ' * Hk// 


) Z(. 


From the definition of h and (3.10) combined with (3.8) and the orthogonality of 
Z^ |-i, we can now see that h can be expressed without explicitly using Qk as follows 

(3.11) h = Q”y=\J2 


( k 


a fc, 1/ o 

\ ( k+1 

9i/' 

\ 

£ 



® E 


<g> Z# 

v- 1 

, a l,k/ ' 

a *k,k/ 0 

) v =1 

9k+l/'_ 

) 


k 

a l,l / ■ 

a k, 1/ 


9i/ 

E 





1 =1 

_ a * ,k,e ‘ 

1 

C3 


9k/_ 


to 























In Step 7 we need to compute the orthogonal completement of y with respect to Q fc . 
This can be represented without explicit use of Q k as follows 


fc+i 


( 3 . 12 ) y±=y~Q k h = Yl 


1=1 


sv 

k 

( 

o-i,i,t • • ■ «i ,k,e 

\ 

_9k+i,e_ 

®zj- £ 
t =i 

\ 

o-k, i,e • • • ®k,k,e 

0 ■■■ 0 _ 

h 

) 


<E> Zl 


k +1 

= £ 

1=1 


fl,£ 


_Jk+ l,t_ 

where we have used the elements of the matrix with columns defined by 


(3.13) 


for t = 1,..., k and /,,*+1 := g^k+i for i = 1,. .., k + 1. 

We need j 3 in Step 8, which is defined as the Euclidean norm of y±. Due to the 
orthogonality of Z k+ 1, we can also express /3 without using vectors of length n. In 
fact, it turns out that /3 is the Frobenius norm of the matrix i, since 


i 

i_ 


9i,t ' 


a 1 , 1 ,e 

ai ,k,e 

i- 

> 

+ • • 

f'ss. 

1_ 


_9k+i,e_ 


&k,i,t ■ 

0 

' O k ,k,£ 

0 


P 2 = \\y±\\ 2 = I £ 


fc+1 

= £ 

t =i 


fi,i 

fk+i,e 


H 


fi,t 

fk+i,e_ 
fi,t 

fk+i,e 


* ze 


H 

\ 

( k+1 

fl,£' 

\ 


£ 

1 

® ZHJ, 


= || [fi,j]ij=i 


,2 

Ifrob ’ 


Finally (in Step 11), we expand Q k by one column corresponding q k +i, which is 

1 k+ 


the normalized orthogonal complement. By using the introduced matrix [/, ,j)itLi we 


have that 


(3.14) 


Let us now define 


(3.15a) 

Oi,k+ 1/ — 

(3.15b) 

^k-\-l,j,£ = 

(3.15c) 

Q'i,j,k +1 = 

Hence, for i 

jj= 


1 

Qk+i = aV -l 


1 k-\-\ 


=i 


fi,e 

fk+l,t 


< 8 > zi- 


in the sense of (3.3), since the column added in comparison to the factorization of Q k 
is precisely (3.14). 
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We summarize the above reasoning with a precise result showing how the de¬ 
pendence on Q k for every step in Algorithm [l] can be removed, including how a 
factorization of Qk+ 1 can be constructed. 

Theorem 3.2 (Equivalent steps of algorithm). Let Qk be the basis matrix gener¬ 
ated by k — 1 iterations of Algorithm [7] and suppose for i,j,£= 1,..., k and Z k 

are given such that they represent a factorization of Qk of the type (3.3). The quanti¬ 
ties computed (by executing Steps 3-11) in iteration k satisfy the following relations. 

(i) 

(ii) 


The vectors y 2 , • • •, Uk+i computed in Step 3, satisfy ( |3.5[ ). 

Suppose yi (computed in Step 3) satisfies y\ £ span(2i,... ,z k ). Let Zk+i £ 
C n and t±,..., tk- i-i £ C be the result of the Gram-Schmidt process satisfying 

be defined by (3.9). Then, then h computed in 


(Hi) 


(3.7). Moreover, let 
Step 6, satisfies (3.11). 

Let [fj,e]j b e defined by (3.13). 
satisfies (3.12 1. 


Then, the vector y±, computed in Step 8, 


(iv) The scalar (3, computed in Step 9, satisfies (3 = [fi,e\i~gl 


fro 


Moreover, if we expand as in (3.15), then, cn,j,t> for i,j,£ = l,...,fc + l and 
Zi,...,Zk+i represent a factorization o/Qfc+i in the sense that (3.3) is satisfied for 

k + 1. 

The above theorem directly gives us a practical algorithm. We state it explicitly 
in Algorithm[2j The details of the (possibly) repeated Gram-Schmidt process in Step 9 
is straightforward and left out for brevity. 


Algorithm 2: Tensor infinite Arnoldi method - TIAR 
Input : x\ £ C n 


1 

2 

3 

4 


5 

6 

7 

8 
9 

10 

11 


Let Q\ = xi/||xi||2, Hq =empty matrix 
for k = 1,2,..., m do 

Compute 2/2, - - -, Vk+i from a itk ,e, i,i= and Z k using (|33|)-([T6|) 

Compute 2/1 from y 2 ,..., y k +i using ( fT2| ) 

Compute t\,... ,t k +1 and z k +i by orthogonalizing yi against Z \,..., z k 
using a (possibly repeated) Gram-Schmidt process such that (3.7) is 
satisfied. 

Compute the matrix G = using (3.9) 

Compute he C k using (3.11) 

Compute the matrix F = using (3.13) 

Possibly repeat Steps 6|7 and obtain updated h and F 

Compute /3 = ||F|| f _ 

Expand using (3.15) 

H k -i h 

0 !3 


Let H k = 


£ C( fe+1 ) xfe 


end 

12 Compute the eigenvalues {/ij}™ T of the leading m x m submatrix of the 
Hessenberg matrix H k 

13 Return approximations {1 /Hi} r )f = i 


Remark 3.3 (Computational performance of IAR and TIAR). Under the condi¬ 
tion that qi i,... ,qi m are linearly independent, Algorithm [l] (IAR) and Algorithm [2] 
(TIAR) are equivalent in exact arithmetic. The required computational resources of 
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the two algorithms are however very different and TIAR appears to be preferable over 
IAR, in general. 

The first advantage of TIAR concerns the memory requirements. More precisely, 
in TIAR, the basis matrix is stored using a tensor 1 e C mxmxm and a ma¬ 

trix Z m £ C nxm . Therefore, TIAR requires the storage of 0{m 3 ) + 0{mn ) numbers. 
In contrast to this, in IAR we need to store 0(m 2 n) numbers since the basis matrix 
Q rn is of size mn x m. Therefore, assuming that m <C n, TIAR requires much less 
memory than IAR. 

The essential computational effort of carrying out m steps of IAR consists of: 
to linear solves, computing y^ : Mb) (0)aq, for k = 1, ... ,m, and orthogonalizing a 
vector of length kn against k vectors of size kn for k = 1,..., m. The orthogonalization 
has complexity 

(3.16) tiAR,orth(TO,n) = C>(?n 3 n), 


which is the dominating cost when the linear solves are relatively cheap as in the 
waveguide eigenvalue problem. 

On the other hand, the computationally dominating part of carrying out to steps 
of TIAR is as follows. Identical to IAR, to steps require to linear solves, and the 
computation of \ M^(0)xi , for k = 1,... ,m. The orthogonalization process in 
TIAR (Step [4]|9]) is computationally cheaper than IAR. More precisely, 

£TiAR,orth(m,n) = 0(m 2 n). 


Unlike IAR, TIAR requires a computational effort in order to access the vectors 
y 2 ,...,yk in Step s since they are implicitly given via and Zk- In Step [2] we 
compute 7 / 2 , • • • ,2/fc with (3.6) and (|3.5|) which correspond to multiplying a matrix of 


size n x k with a matrix of size k x k (and subsequently scaling the vectors). Hence, 
the operations corresponding to Step [2] for m iterations of TIAR can be carried out 


(3.17) 


tTi AR, step ^m,n) = 0(m 3 n). 


At first sight, nothing is gained since the complexity of the orthogonalization 
in IAR (3.16) and Step 2 of TIAR, are both 0(m 3 n). However, it turns out that 
TIAR is often considerably faster in practice. This can be explained as follows. In 
the orthogonalization process of IAR we must compute h = Q^y where Q k £ C fcnxfc , 
whereas in Step 2 in TIAR we must compute Z^Aj. (in (3.5)) where Zk £ C nxk and 
Afc £ C kxk . Note that the operation Z^Aj, involves nk + k 2 values, whereas Q^y 


involves nk 2 values, i.e., Step 2 in TIAR involves less data. This implies that on 
modern computer architectures, where CPU caching makes operations on small data¬ 
sets more efficient, it is in practice considerably faster to compute Z^A^ than Q^y 
although the operations have the same computational complexity. This difference is 
also verified in the simulations in section [5l 


4. Adaption to the waveguide problem. 

4.1. Cayley transformation. One interpretation of IAR involves a derivation 
via the truncated Taylor series expansion. The truncated Taylor expansion is expected 
to converge slowly for points close to branch-point singularies, and in general not 
converge at all for points further away from the origin than the closest singularity. 
Note that M defined in (1.4) has branch point singularities at the roots of /?±,fc(7), k — 


13 






—p, ■ ■ ■ ,p where j3± t k is defined in (2.3). In our situation, the eigenvalues of interest 
are close to the imaginary axis and, since the roots of [3± t k are purely imaginary, the 
singularities are purely imaginary, which suggests poor performance of IAR (as well 
as TIAR) when applied to M. 

In order to resolve this, we first carry out a Cayley transformation which moves 
the singularities to the unit circle and the eigenvalues of interest to points inside the 
unit disk, i.e., inside the convergence disk. 

In our setting, the Cayley transformation for a shift y 0 £ (—00, 0) x (—27r, 0) is 
given by 


(4.1) 

and its inverse is 

(4.2) 


, _ 7 - 7o 

^ I - ’ 

7 + 7o 


7 = 


7o + A70 
1 - A 


The shift should be chosen close the eigenvalues of interest, i.e., close to the imagi¬ 
nary axis. However, the shift should not be chosen too close to the imaginary axis, 
because this generates a transformed problem where the eigenvalues are close to the 
singularities. 

Note that the transformed problem is still a nonlinear eigenvalue problem of the 
type (1.4), and we can easily remove the poles introduced by the denominator in (4.1). 
More precisely, we work with the transformed nonlinear eigenvalue problem 


(4.3) 

where 


M( A) := 


(1 — A) 2 / 


(1 - A)/ 




To+^To \ 
1—A ) 


F a { A) F Cl (A) 

(1 - A)Cf P(A) . 


Fa{ A) (1 — A) 2 Ao + (70 + Ayo)(l — A)Ai + (70 + A7o)"A.2, 

Fc i(A) := (1 — A) 2 Ci, 0 + (70 + A 7 o )(1 — A)C'i i i + (70 + A / yo) 2 C'i i 2, 

(4.4) P(A) :=(1-A)P( 70 1 + A a 70 ). 


4.2. Efficient computation of y\. In order apply IAR or TIAR to the waveg¬ 
uide problem, we need to provide a procedure to compute yi in Step [3] of Algorithm [l] 
and Algorithm [i] using (3.2). The structure of M in (4.3) can be explicitly exploited 
and merged with Step 2 as follows. We analyze (3.2) for k > 3. It is straightforward 
to compute the corresponding formulas for k < 3. Due to the definition of M in (4.3), 
formula (3.21 can be expressed as 

(4.5) - M{0)yi = Zx + z 2 , 


with 


(4.6) 


Z\ := 

n(0)y 2 ,i + ^(0)142,2 + P^(0)2/3.i + F^(0)y 3 , 2 

-cfW 


0 


22 := 

((%+!, 2, 
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= (y?i,y?2). 


= 1 , 


where we have decomposed yj 

Vi,2 £ C 2n *. The linear system of equations (4.5) can be solved by precomputing the 


, k with yi ,i G C n * n * and 


Schur complement and its LU-factorization, which is not a dominating component (in 
terms of execution time) in our situation. The vector z\ can be computed directly by 
using the definition of Fa and Fq 1 - Using the definition of P in (4.4), we can express 
the bottom block of z 2 as 


(4.7) 


22,2 = 


R 0 
0 R 


-i k 




R- 1 

0 


0 

R- 1 


Vi+ 1,2 I £ 


'*2n. 


where D t := diag (a_ _ Pji ,..., a . j, a+, P ,i) with 


(4.8) 


a ±,j,i ■■= ( ^ (a - a)( S± j (^^)+ d 0 ) 


A=0 


In order to carry out m steps of the algorithm, we need to evaluate ( |4.8| ) 2 n z m times. 
We propose to do this with the efficient recursion formula given Appendix[C] We note 
that similar formulas are used in m for slightly different functions. 

Although the above formulas can be used directly to compute yi, further perfor¬ 
mance improvement can be achieved by considerations of Step 2. Note that the com¬ 
plexity of Step 2 in TIAR is 0(m 3 n ), as given in equation (3.17). The computational 


complexity of this step can be decreased by using the fact that in order to compute 

> Uk+ 1,2 £ C 2ra *, 


yi in Step 3 and equation (4.5 )-( |4.6[ ), we only need 2 / 2 ; 2/3 and 2 / 4 , 2 ; 
i.e., not the full vectors. The structure can exploited in the operations in Step 2 as 
follows. 

Let B u G C 2x2 ,Ri 2 £ C 2x ( fc - 2 \£ 2 i £ C( k ~ 2 '> x2 and B 22 £ C( k ~ 2 ) x ( k ~ 2 ) be 
defined as blocks of A k , 


(4.9) 


Ah = 


Bn 
B 2 1 


-B 12 

B 22 


From (3.5) and (3.6) we have 
(4.10) [y 2 2 / 3 ] = Z k 

and 


Bn 

B 2 i 


[2/2 2/3] = [2/2 2/3/2] , 


( 4 . 11 ) [i/4, 2 ... 2/fc+i, 2 ] — Z k} 2 B 22j , [?/4, 2 ... yk+1,2] — [^4,2/3 


Vk+i, 2 /k] , 


where Z k ^ 2 G C 2 ™j;x( fe - 2 ) consists of the trailing block of Z k . 

By using formulas (4.10)-(4.11), we merge Step 2 and Step 3 in Algorithm [ 2 ] such 
that we can compute 2/1 without computing the full vectors y 2 .... .y k - For future 
reference we call this adaption WTIAR. 

As explained in Remark |3.3[ Step 2 of TIAR is the dominating component in 


terms of asymptotic complexity. With the adaption explained in (4.9)-(4.11), the 
complexity of Step 2 in WTIAR is 


(4.12) 


1 wtiar, step 2 (m,n) = 0{nm 2 ) +0(n z m 3 ). 


If the problem is discretized with the same number of discretization points in x- 
direction and ^-direction, we have twTiAR,step 2 (m,n) = 0(nm 2 ) +0(^/nm 3 ), which 
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is considerable better than the complexity (3.171, i.e., the complexity of Step 2 in 
the plain TIAR. Notice that when n is sufficiently large the dominating term of 
the complexity of WTIAR is 0(nm 2 ) which is also the complexity of the Arnoldi 
algorithm for the standard eigenvalue problem. The complexity is verified in practice 
in section [5] 

5. Numerical experiments. 


5.1. Benchmark example. In order to illustrate properties of our approach, 
we consid er a waveguide previously analyzed in EUi- We set the wavenumber as 


in Figure 5.1 where K\ = x/273 w, K 2 = y/3 lo, K 3 = 


and 


Recall that 


the task is to compute the eigenvalues in the region fl := (—00, 0) x (—27r, 0) C C, in 
particular those which are close to the imaginary axis. 



Figure 5.1: Geometry of the waveguide in Section 


5.1 


We select X- and x+ such that the interior domain is minimized, i.e., x __ = 0 
and x + = 2/7r + 0.4. The PDE is discretized with a FEM approach as explained in 
section [272) Recall that the waveguide eigenvalue problem has branch point singular¬ 
ities and that the algorithms we are considering are based on Taylor series expansion. 
As explained in section |4.1[ the location of the shift 7 in the Cayley transformation 
influences the convergence of the Taylor series, and cannot be chosen too close to the 
target, i.e., the imaginary axis. We select 70 = — 3 + *7r, i.e., in the middle of f 1 in the 
imaginary direction. The error is measured using the relative residual norm 

(5.1) E(w, 7):=—- \\ M (.-y) w \\ - 

ELo H* (II a<|| + IICmII) + IICJII + 2do + £?=-„ (ki, + (7)l + Ia,-(7)I) 


for ||w;|| = 1. 

We discretize the problem and we compute the eigenvalues of the nonlinear eigen¬ 
value problem using WTIAR^] They are reported, for different discretizations, in Ta¬ 
ble [5Tl The required CPU time is reported in Table |5.2| The solution of the problem 


The required CPU time is reported in Table 5.2 
with the finest discretization, i.e., the last row in|5.1 


was computed in more than 10 
hours. The bottleneck for the finest discretization is the memory requirements of the 
computation of the LU-factorization of the Schur complement corresponding to M(0). 
An illustration of the execution of WTIAR, for this problem, is given in Figure [u~2j 
where the domain is discretized with n T = 640 and n z = 641 and m = 100 iterations 


2 All simulations were carried out with Intel octa core i7-3770 CPU 3.40GHz and 16 GB RAM, 
except for the last two rows of Table [W] which were computed with Intel Xeon 2.0 GHz and 64 GB 
RAM. 
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-3-2-10123 

x 

(b) Absolute value of the eigenfunction 



Ru 

_ Ru 





-300 -200 -100 0 100 200 300 


(a) Relative residual norm history 


(c) Fourier coefficients decay of u(x~ , z) and 
u(X-\-, z) 


Figure 5.2: Execution of m = 100 iterations of WTIAR. The parameters of the DtN 
maps are X- = 0 and x + = 2/n + 0.4. The domain is discretized setting n x = 640 
and n z = 641. 


are performed. We observe in Figure 5.2a that two Ritz values converge within the 
region of interest and two additional approximations converge to values with positive 
real part and of no interest in this application. 

In Figure [5.2c we observe that the Fourier coefficients do not have exponential 
decay for u(x+,z). Indeed, the decay is quadratic, which is consistent with the fact 


that the solutions are (^-functions, but in general not C 2 , as explained in Remark 2.1 


In particular, the second derivative of the eigenfunction is not continuous in x = x + . 
Hence, the eigenfunctions appear to have just weak regularity, which means that the 
waveguide eigenvalue problem does not have a strong solution. This supports the 
choice of the discretization method, based on the FEM, that we use in this paper. In 
these simulations we selected x± such that the enterior domain is minimized. We also 
carried out simulations for larger interior domains, without observing any qualitive 
difference in the computed solutions. By Remark 2.1 this suggests that the C 1 - 
assumption is not a restriction in this case. 

The plot of the absolute value of one eigenfunction is given in Figure |5.2b| The 
convergence rate with respect to discretization, appears to be quadratic in the diam¬ 
eter of the elements. See Table loTTl 

As we mentioned in Remark |3.3[ TIAR requires less memory and has the same 
complexity of IAR, although it is in practice considerable faster. According to sec¬ 
tion |4.2[ WTIAR requires the same memory resources as TIAR, but WTIAR has 
lower complexity. These properties are illustrated in Figure [5.3a| and Table [5721 

As we showed in the theorem 3.2 TIAR and IAR are mathematically equivalent 
by construction. However, IAR and TIAR (as well as WTIAR) incorporate orthog- 
onalization in different ways which may influence the impact of round-off errors. It 
turns out that the H m matrices computed with IAR and TIAR are numerically differ¬ 
ent, but the Ritz values in H have a small difference. See Figure [A3bJ This suggests 
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Problem size 

n x 

n z 

First eigenvalue 

Second eigenvalue 

132 

10 

ii 

-0.010297987 - 4.966269257i 

-0.008202089 - 1.390972357i 

462 

20 

21 

-0.009556975 - 4.965939619i 

-0.009012367 - 1.337899343i 

1,722 

40 

41 

-0.009401369 - 4.965933116i 

-0.009258151 - 1.322687924i 

6,642 

80 

81 

-0.009368285 - 4.966067569i 

-0.009332752 - 1.3185118331 

26,082 

160 

161 

-0.009359775 - 4.966072322i 

-0.009350769 - 1.317465909i 

103,362 

320 

321 

-0.009357649 - 4.96607181U 

-0.009355348 - 1.317202268i 

411,522 

640 

641 

-0.009357159 - 4.966073495i 

-0.009356561 - 1.317134070i 

1,642,242 

1,280 

1,281 

-0.009357028 - 4.966073418i 

-0.009356859 - 1.3171174431 

6,561,282 

2,560 

2,561 

-0.009356994 - 4.966073409i 

-0.009356933 - 1.317113346i 

9,009,002 

3,000 

3,001 

-0.009356991 - 4.966073406i 

-0.009356938 - 1.3171129051 


Table 5.1: Eigenvalue approximations stemming from WTIAR with m = 100. 




(a) CPU time needed to perform m = 100 itera- (b) Difference between the matrices H m and Ritz 
tions of IAR, TIAR and WTIAR values in computed with IAR and WTIAR 

Figure 5.3: Comparison between IAR, TIAR and the WTIAR in terms of CPU time 
and stability 


that there is an effect of the roundoff errors, but for the purpose of computing the 
Ritz values located in Q, such error is benign. Moreover, since TIAR requires less 
operations in the orthogonalization and implicitly imposes the structure, the roundoff 
errors might even be smaller for TIAR than IAR. 

We mentioned in section |4.2[ that when n became sufficiently large, the dom¬ 
inating part of WTIAR is the orthogonalization process. This can be observed in 
Figure 5.4 Recall that the orthogonalization in WTIAR has complexity 0(nm 2 ), 
which is also the complexity of the standard Arnoldi algorithm. Hence, solving the 
waveguide eigenvalue problem with WTIAR using a fine discretization, has in this 
sense the same complexity as solving a standard eigenvalue problem of the same size 
using the Arnoldi algorithm. According to remark |3.3| the dominating part of IAR 
is also the orthogonalization process, but this has higher complexity 0(nm 3 ). See 
Figure |5.4^t. 


5.2. Waveguide with complex shape. In order to show the generality of our 
algorithm, we carried out simulations on a waveguide with a more complex geometry 
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CPU time 

storage of Q m 

n 

n x 

n z 

IAR 

WTIAR 

IAR 

TIAR 

462 

20 

21 

8.35 secs 

2.58 secs 

35.24 MB 

7.98 MB 

1,722 

40 

41 

28.90 secs 

2.83 secs 

131.38 MB 

8.94 MB 

6,642 

80 

81 

1 min and 59 secs 

4.81 secs 

506.74 MB 

12.70 MB 

26,082 

160 

161 

8 mins and 13.37 secs 

13.9 secs 

1.94 GB 

27.52 MB 

103,362 

320 

321 

out of memory 

45.50 secs 

out of memory 

86.48 MB 

411,522 

640 

641 

out of memory 

3 mins and 30.29 secs 

out of memory 

321.60 MB 

1,642,242 

1280 

1281 

out of memory 

15 mins and 20.61 secs 

out of memory 

1.23 GB 


Table 5.2: CPU time and estimated memory required to perform m = 100 iterations 
of IAR and WTIAR. The memory requirements for the storage of the basis is the 
same TIAR and WTIAR. 



-*— Orthogonalization 
h— Linear Solves 
-0— Other 



-#— Orthogonalization 
^— computation of y 2 ... y m 
h— Linear Solves 
Other 


(a) IAR (b) WTIAR 

Figure 5.4: CPU time (percent) of the main parts of IAR and WTIAR with m = 100 
iterations. 


5.5 


where K\ = y/2 (3<u, K 2 = 2-\/3u K 3 = 


and solutions. It is described in Figure 
4\/3 u and K 4 = u> and ui = n. 

We again select x_ and x + such that the interior domain is minimized, i.e., x_ = 0 
and = 2. We discretize the problem and choose the same discretization parameters 
as in section 5.1 and choose as shift 70 = —2 — in. An illustration of the execution 
of WTIAR, for this problem, is given in Figure 5.6 where the domain is discretized 
with n x = 640 and n z = 641 and m = 100 iterations are performed. We observe 
that several Ritz values converge within the region of interest U. See Figure 5.6b|and 


Figure [5.6a| One of the dominant eigenfunctions is illustrated in Figure [5.6c 


6. Concluding remarks and outlook. We have in this paper presented a new 
general approach for NEPs and shown how to specialize the method to a specific the 
waveguide eigenvalue problem stemming from analysis of wave propagation. Note 
that the non-polynomial nonlinearity stems from the absorbing boundary conditions. 
In our setting we were able establish an explicit characterization of the DtN maps, 
which allowed us to incorporate the structure at an algorithmic level. This approach 
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Figure 5.5: Geometry of the waveguide 
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10 “ 


- — Ritz values in Q 
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Iteration k 


100 


(a) Eigenvalues (converged Ritz values) and sin- (b) Relative residual norm history 

gularities 



x 

(c) Absolute value of the eigenfunction 


Figure 5.6: Execution of to = 100 iterations of WTIAR. The parameters of the DtN 
maps are X— = 0 and x+ = 2. The domain is discretized setting n x = 640 and 
n z = 641. 


does not appear to be restricted to the waveguide problem. Many PDEs can be 
constructed with absorbing boundary conditions expressed in a closed forms. By 
appropriate analysis, in particular differentiation with respect to the eigenvalue, the 
approach should carry over to other PDEs and other absorbing boundary conditions. 

There exist several variants of IAR, e.g., the Chebyshev version [50] and restarting 
variations m- There are also related rational Krylov methods [3]. The results of 
this paper may also be extendable to these situations, although this would require 
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further analysis. In particular, all of these methods require (in some way) a quantity 
corresponding to formula y\ in (3.2), for which the problem-dependent structure must 
be incorporated. The computation of this quantity must be accurate and efficient and 
require considerable problem-specific attention. 

Appendix A. Proof of Lemma |2.2[ We consider the exterior problem on S+ in 
(2.5). The proof corresponding to S- is analogous. To simplify the notation we write 


2.1 


Pk = /3+,fc(7) and assume, without loss of generality, that x+ = 0. By Remark 
the solutions of (2.5) are in C 1 and every vertical trace can then be expanded in a 
Fourier series. Therefore we can express 

w{x, z) = E w k (x)e 2nikz , 


a(z) = E 5fcf 


,27 rikz 


kez 


By again using Remark we have that the solutions to the exterior problem are in 
C°° if x > 0 and satisfy (2.5a). Therefore, the coefficients w k satisfy 

E (w k - (2tt k) 2 w k + 47T7 iw k + (j 2 + n 2 + )w k ) e 2 ™ kz = E {w k + P k w k ) e 


2'nikz 


= 0 , 


fcez 


fee z 


where f3 k is given in (2.3). Thus, in order for w to satisfy (2.5aI, we have 

v'k + PkWk = o, 


w h 


for all k. We now claim that there are constants C, C l independent of k such that 
(A.l) \p k \ <C(l + (27rfc) 2 ), | Im \//3fc| > C' \J\ + (2nk) 2 . 

In particular, /3 k ^ 0 and 

Wk (x) = c fc e isign(Im( ^) ) ^ x + d k e~ is ign (im(fc))/&7 

To determine c k and d k we have two boundary conditions. First, since w £ H 1 (S+), 
then \w k (x)\ can not grow as x —> oo. This means that d k = 0. Second, at x = 0, we 
have w(0, z) = g(z ), so c k = g k . Hence, we have the explicit solution 

w k { x) = 5fc e isigIl( im(ft))v«i. 


Existence is thus proved, and the relationship (2.7) for the DtN maps follows directly 
for this solution by differentiating w k (x) and evaluating at x = 0. We also have 
T+ n [g] £ L 2 ( 0,1) since 

l|T + , 7 [ff]||l a( o,i) = E \fo\ l^-l 2 < + ( 27rfc ) 2 ) \9k\ 2 = C||p||/fi ([ o,i]). 

feez fcez 

Finally, the estimate ( |2.6| ) is given by 

!(s+) = IHIi 2 (s+) + ll^ w lli 2 (s + ) = E [(^ + ( 2 nk) 2 ) ||tUfc||i 2 ( 0)OO ) + ||wfe||i2( 0iOO ) 


I hh 


— E 1^-1“ (■*■ (2tt^) 2 + \Pk\) f 

JO 


fee z 

OO 


„ —2sign(Im(/3fc)) Im 


dx 


= E 


fcez 

C + l 
2 a 


2 Im 


^ ^ ra-| (l + (27rfc) 2 + |/3fc|) < E \dk\ 2 \/l + (27T fc) 2 

vPk\ fcgZ 


M\ 
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Uniqueness follows from this estimate. It remains to show the claim (A.l). The 
estimate for \ffi\ is straightforward. For the second estimate we note that Im/3 fc = 
2| Re 7(271-fc + Imq)! ^ 0 for all k from the assumptions Imq ^ 27rZ and Re 7 ^ 0. It 
follows that also Im y//3 k 7^ 0 for all k and since 


,. r InrVS 

lim a k := hm — — 

|fc |—^00 |A:|—S-oo \Jl + (27rfc) 2 


= Im , 


/3fe 


Ifcfe, 1 + (27rfc) 2 


= 1 , 


the sequence {\/a k } is bounded. Hence, there is a C such that (A.l) holds, which 
concludes the proof. 

Appendix B. Matrices of the FEM discretization. The matrices (Aj)?_ 0 
and {C i,i)f =0 are stem from to the Ritz-Galerkin discretization of the bilinear forms 
a, b and c. They can be decomposed and expressed as 

Ao 

Ai 

A 2 

Now we need to define the following tridiagonal Toeplitz matrices. Let E m be the 
tridiagonal Toeplitz matrix with diagonals consisting of = —1, e,;.,; = 2 and 

ei,j+i = — 1. Let F m be the tridiagonal Toeplitz matrix with diagonals consisting of 
fi+i,i = 1, fi,i = 4 and = 1. Let G m be the anti-symmetric tridiagonal Toeplitz 
matrix consisting of g l+ \ ^ = 1, 5^.3 = 0 and \ = —1. 

Then, we have 

D xx = —-E ni ® (Bn. + e nz ef + eiej.') , -(ei,e„ x ) $ +e„ 2 ef + eie^ 

6 h x v 6/i x '• 


D xx + D zz + K , 

C 1,0 

D xx + D zz + K 

:= 2 D z , 

Ci,i 

:= 2 D z , 

:=B, 

Cl,2 

:= B. 


= -Av- F n x ® - e^ef - eie T 

6/1 


) , As* = ~-^{ei,e nx )® (Bn* ~ en z ef - eie^ 


D, = - — F n 
12 n 

D h x h z 

B ~ ~Affi Gn - 


1 + e nz ej - eie^ 

(Gn z + en z ej - e l e n z 


)■ 


£>, = -j^(ei,e„ x ) ® ( Gn z + e„ z e'( - e 1 ef 


B = 


h x h z 

36 


(ei, —e„ x ) ® (g„, + e nz ef - e\e^ z ) . 

The matrices K and K arise from the Ritz-Galerkin discretization of the bilinear 
form 

f 1 f x+ 

f(u,v)= / / k(x, z) 2 u(x, z)v(x, z) dx dz. 

J 0 Jx- 

The elements of such matrices are obtained integrating the product of two basis 
functions against the square of the wavenumber. We can split the integral over the 
elements. Recall that k(x,z) is piecewise constant, then the final task is to compute, 
for each element, the integral of a piecewise polynomial function. Such integral is 
given in an explicit form by quadrature formulas. 

Appendix C. Computation of the derivatives in DtN-map. In the com¬ 


putation of y 1 described in section 4.2 we need the coefficients a-k,j,e- They can be 
computed with the following three-term recurrence. 

Lemma C.l (Recursion for a± j f). Suppose 70 ^ zK. Then the coefficients in 


(4.8) are explicitly given by 


(C.l) 


0 </i ,j,o do 

1 do 
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where coefficients f±,j,e satisfy the following three-term recurrence 

2 a± tj {i - 3)f±, jt i-2 + b ±J (2t - 3 )f± d , t - 1 


(C.2) 


with 


f±,j,e = 

/±j, o \A-± ,j j 

, _ 6 ±j 

7±,j,i - o 7=— 


2fc±j 


> 2 , 


" o± ,j = 7o 2 - 47rij7o - 47r 2 j 2 + 
b±,j = 270To + 47 tij(to - 70) + 8?r 2 j 2 - 2«|, 
c±j = 47rijTo - 4 tt 2 j 2 + *4 + To> 

,u>j = sign (Re(To) (Im(To) + 2 t rj')). 


Proof. By definition (4.81 


a ±,j,t ( ^ / ^) s ±j 


7o + A70 
1 — A 


A—O 


+ ( dy ^ _ 


A=0 


The computation of the second term is straightforward. The first term can be com¬ 
puted as follows. In order to compute the derivatives in zero, we now derive formulas 
for the Taylor expansion 


(1 - A )s ±J 


7o + A70 
1 - A 


+OO 




t =0 


Since all functions are analytic in the origin, there exists a neighborhood of the origin 
N, such that when A € N, 


(1 - A )s±,j 


7o + A70 
1 - A 


= Wj \J a±j A 2 + b±j A + c±j. 


Hence, we have reduced the problem to computing the power series expansion of 
•\/ a ±jA 2 + b ±jA + c±j. To this end we use the well known formula involving the 
Gegenbauer polynomials and their generating function. See e.g. m- We have that 


a ±j A 2 + b±j A + c±j = \fc±. 


c±, 


A + 


°±,j 


+ OO 


= £ 


S\ 


±,i^(-l/2) 


_ n 

Jt -1 W 
c ±-j 


y/a±jc±j W c ±,i 
b ±,j 


a±J A ) +1 


2y/ a ±,j c ±,j 


A £ , 


where Cf is the £-th Gegenbauer polynomial. Consequently, the coefficients in the 
power series expansion are 




\ 


±,j_ c (-l/ 2 ) 


L ±J 




2 y/a±,jC± ,j 


The recursion (C.2) follows from substitution of the recursion formula for Gegenbauer 
polynomials. □ 
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